skip to main content


Search for: All records

Creators/Authors contains: "Yang, Yun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    This article considers Bayesian model selection via mean-field (MF) variational approximation. Towards this goal, we study the non-asymptotic properties of MF inference that allows latent variables and model misspecification. Concretely, we show a Bernstein–von Mises (BvM) theorem for the variational distribution from MF under possible model misspecification, which implies the distributional convergence of MF variational approximation to a normal distribution centring at the maximal likelihood estimator. Motivated by the BvM theorem, we propose a model selection criterion using the evidence lower bound (ELBO), and demonstrate that the model selected by ELBO tends to asymptotically agree with the one selected by the commonly used Bayesian information criterion (BIC) as the sample size tends to infinity. Compared to BIC, ELBO tends to incur smaller approximation error to the log-marginal likelihood (a.k.a. model evidence) due to a better dimension dependence and full incorporation of the prior information. Moreover, we show the geometric convergence of the coordinate ascent variational inference algorithm, which provides a practical guidance on how many iterations one typically needs to run when approximating the ELBO. These findings demonstrate that variational inference is capable of providing a computationally efficient alternative to conventional approaches in tasks beyond obtaining point estimates.

     
    more » « less
  2. In-time particle trajectory reconstruction in the Large Hadron Collider is challenging due to the high collision rate and numerous particle hits. Using GNN (Graph Neural Network) on FPGA has enabled superior accuracy with flexible trajectory classification. However, existing GNN architectures have inefficient resource usage and insufficient parallelism for edge classification. This paper introduces a resource-efficient GNN architecture on FPGAs for low latency particle tracking. The modular architecture facilitates design scalability to support large graphs. Leveraging the geometric properties of hit detectors further reduces graph complexity and resource usage. Our results on Xilinx UltraScale+ VU9P demonstrate 1625x and 1574x performance improvement over CPU and GPU respectively. 
    more » « less
  3. Free, publicly-accessible full text available June 1, 2024
  4. Agriculture is a major water user, especially in dry and drought-prone areas that rely on irrigation to support agricultural production. In recent years, the over-extraction of groundwater, exacerbated by climate change, population growth, and intensive agricultural irrigation, has led to a drop in water levels and influenced the hydrological cycle. Understanding changes in hydrological processes is essential for pursuing water sustainability. This study aims to estimate the amount and impact of irrigation on hydrological processes in two breadbasket regions, Jing-Jin-Ji (JJJ), China, and northern Texas (NTX), US. We used the Soil and Water Assessment Tool (SWAT) to explore spatiotemporal variations of irrigation from 2008 to 2013 and compared changes in hydrological processes caused by irrigation. The results indicated that deficit irrigation is more common in JJJ than in NTX and can reduce approximately 50 % of irrigation water use in areas with intensively irrigated cropland. The applied irrigation varies less over time in NTX but fluctuates in JJJ. Compared with NTX, the higher irrigation intensity in JJJ results in a more significant change in downstream peak streamflow of around 6 m3/s. Moreover, the difference in crop growing seasons can lead to different impacts of irrigation on hydrological processes. For example, the percentage change of surface runoff under real-world relative to the no-irrigation scenario was the greatest, around 40 %, in JJJ and NTX. However, the peak change occurred at different times, with the nearing maturity of winter wheat in May in JJJ and corn in August in NTX. The great potential to reduce groundwater extraction by adopting water conservation irrigation techniques calls for policies and regulations to help farmers shift towards more sustainable water management practices. 
    more » « less
  5. The function, structure, and mechanical properties of protein materials make them well-suited for a range of applications such as biosensors and biomaterials. Unlike in traditional polymer synthesis, their sequences are defined and, in the case of recombinant proteins, dictated by the chosen DNA sequence. As DNA synthesis has rapidly progressed over the past twenty years, the limiting bottleneck in protein materials development is the empirical optimization of protein expression. Herein, a low-cost, automated, high-throughput, combinatorial protein expression platform is developed to test permutations of DNA vectors and Escherichia coli ( E. coli ) strains in a 96-well plate format. Growth and expression are monitored with optical density at 600 nm (OD 600 ) to measure growth, Bradford assays to establish the total protein concentration, and dot blot assays to determine the concentration of the protein of interest. With an eye toward accessibility for researchers without suites of biosynthetic equipment, automated camera-based assays are validated for the OD 600 assay, via turbidimetry, and the Bradford assay, via colorimetry. High-yield expression conditions can be determined within a week. Notably, in several cases, previously un-expressible proteins are expressed successfully in viable yields. Collectively, an efficient approach to overcoming long-running synthesis challenges in protein materials development is established, which will expedite materials innovation. 
    more » « less
  6. Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed K-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the K-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original K-means SDP with substantially reduced runtime. 
    more » « less
  7. null (Ed.)